Tracing pistachio nuts’ origin and irrigation practices through hyperspectral imaging

Pistachio trees have become a significant global agricultural commodity because their nuts are renowned for their unique flavour and numerous health benefits, contributing to their high demand worldwide. This study explores the application of Hyperspectral Imaging (HSI) and Machine Learning (ML) to determine pistachio nuts' geographic origin and irrigation practices, alongside predicting essential commercial quality and yield parameters. The study was conducted in two Spanish orchards and employed HSI technology to capture spectral data. It used ML models like Partial Least Squares (PLS), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost) for analysis. The results demonstrated high accuracy in classifying pistachios based on origin, with accuracies exceeding 94%, and in assessing water content and colour pigments, where both PLS and SVM models achieved 99% accuracy. The research highlighted distinct spectral signatures associated with different irrigation treatments, particularly in the Near-Infrared (NIR) region, with PLS showing an accuracy of 92%. However, challenges were noted in predicting fruit orientation, while predicting height location within the tree was more successful, reflecting clearer spectral distinctions. Regression models also showed promise, particularly in predicting yield (R2 = 0.89 with PLS) and percentage of blank nuts (R2 = 0.71 with PLS). The correlation analysis revealed key insights, such as an inverse relationship between blank nuts and yield, and a strong correlation between yield and split nuts. Despite challenges in predicting fruit orientation, the research showed promising results in forecasting yield and commercial quality factors, indicating the effectiveness of spectral analysis in optimising pistachio production and sustainability.


Introduction
Pistachio trees, originating from southern Central Asia, have expanded globally, with evidence suggesting their westward spread to areas like modern Syria at least 2000 years ago (Mir-Makhamad et al., 2022).Pistachio nuts, known worldwide for their distinctive flavour and use in different cuisines, have attracted attention for their numerous health benefits, such as high protein, dietary fibre, and essential vitamins and minerals (Mandalari et al., 2021).Pistachios are a nut of great importance for global agriculture, and they have consolidated as profitable crops with growing market demand.In 2022, Iran had the most significant area harvested for pistachios, with 497,484 ha; Turkey came second, with 408,709 ha, and the United States was third, with 173,207 ha (FAO, 2024).However, the production quantity by country in 2022 was reversed, with the United States producing 400,070 t, Turkey producing 241,669 t, and Iran producing 239,289 t.Spain has seen a growing interest in cultivating pistachios lately, and even though the market experienced a decline in 2022, the Spanish pistachio market has demonstrated its strength over time with a significant rise in consumption despite dealing with issues of market instability and Abbreviations: ACC, Accuracy; HIS, Hyperspectral Imaging; k-NN, k-Nearest neighbour; MAE, Mean Absolute Error; ML, Machine Learning; MSE, Mean Squared Error; NIR, Near-Infrared; PLS-DA, Partial Least Squares Discriminant Analysis; PLS-R, Partial Least Squares Regression; Rbf, Radial Basis Function; RMSE, Root Mean Squared Error; SNV, Standard Normal Variate; SVM, Support Vector Machine; XGBoost, Extreme Gradient Boosting.competition from imports from other countries (CBI -Centre for the Promotion of Imports from developing countries, 2020).
The food industry faces challenges like food fraud (Meerza et al., 2019).Specifically, the pistachio industry deals with issues such as mislabelling regarding origin and adulteration (Aykas and Menevseoglu, 2021).In addition, irrigation practices can affect pistachio nut quality (Martínez-Peña et al., 2023).These problems require developing robust, non-invasive, and efficient techniques to trace their geographical origin and determine pistachio irrigation treatments.Spectroscopy, analysing the electromagnetic radiation of an object to identify composition and properties, emerges as a solution.It involves measuring the spectrum of light absorbed, emitted, or scattered by materials and it has been used in the agri-food production sector to evaluate fruit quality (Lin and Ying, 2009), leaf water content (Rodríguez-Pérez, 2017), and disease assessment (Vélez et al., 2024;Xie et al., 2017).Hyperspectral Imaging (HSI), which integrates the advantages of imaging systems and spectroscopic instruments to provide spatially resolved spectral data (Wu et al., 2022), has also been widely used in the agri-food sector.
In contrast to traditional spectrometers' capabilities, HSI captures spectral profiles across areas rather than mere points, offering a comprehensive characterisation of absorption and reflection bands linked to objects and their prevailing conditions (Khan et al., 2022).It has demonstrated its feasibility for disease detection, classification, grading and detection of chemical attributes among various agricultural products (B.Wang et al., 2023).The principle of HSI consists of capturing and analysing images across a broad spectrum of wavelengths, offering a systematic approach to detect nuances in agricultural products, including grains, fruits, vegetables and meats (Zhu et al., 2020).During the last two decades, HSI has been widely researched, showing its promising potential for measuring quality and protecting horticultural and agricultural products.Its evolution from remote sensing, computer vision, and point spectroscopy provides superior image segmentation for defect and contamination detection, thus ensuring the quality of agricultural products and facilitating the extraction of relevant spectral signatures (Sethy et al., 2022).These signatures are vital in acquiring crucial agricultural information and detecting both external and internal quality attributes of farming products, such as pistachio nuts (C.Wang et al., 2021).The post-harvest phase is particularly critical, as it is laden with biosecurity, diagnostics, and quality assessment challenges, all of which significantly impact the product's commercial value and consumer acceptance (Palumbo et al., 2022).
Nevertheless, unlike RGB images that consist of three colour channels, HSI usually contains hundreds of spectral bands.While exploiting this wealth of information is complicated, its potential is undeniable, and new techniques are continually evolving to analyse this data better (L.Wang and Zhao, 2016).In this sense, Machine Learning (ML) techniques have facilitated and boosted the application of HSI as non-destructive, real-time techniques for assessing food quality and safety within the supply chain, from sorting to sales (Kang et al., 2022).Thus, integrating HSI with ML methodologies has revolutionised non-destructive testing across various sectors, especially in agriculture and food quality assessment.ML has opened the door to estimating pistachio mass precisely, highlighting the efficacy of integrating spectral data for enhanced agricultural produce evaluation (Saglam and Cetin, 2022), detecting damage in mango using NIR hyperspectral images, not visible to the naked eye (Vélez Rivera et al., 2014), accurately predict internal quality parameters of apples (Çetin et al., 2022), and designing specialised sensing systems via spectral filters for mango ripeness estimation through field hyperspectral imaging and ML (Gutiérrez et al., 2019).These advances demonstrate the significant impact of the fusion of HSI and ML in advancing food quality, improving agri-food production and safety inspection, and promoting a shift towards more efficient, accurate and non-destructive methodologies.
This study explores the potential of HSI technology and Machine Learning to discriminate between different irrigation treatments and geographical origins of pistachio nuts harvested from two orchards in Spain.Employing Python, libraries like Scikit and ML models such as PLS-DA, PLS-R, SVM, and XGBoost, this study hypothesises that different irrigation methods and locations significantly affect pistachio yield and commercial quality, and these effects can be identified using HSI technology.HSI images were used to build models to classify pistachios based on their origin and irrigation treatments, improving their traceability and authenticity.

Study site and plant material
In 2022, experiments were performed in two pistachio orchards located in Valladolid, Castilla y León, Spain.These orchards were situated in "Moraleja de las Panaderas" (M) (X: 347,802.062;Y: 4,570,373.104)and "La Seca" (S) (X: 341,455.662; Y: 4,589,735.763),areas in the southern part of the province.
The plant material selected for the experiments consisted of 7-yearold (Moraleja) and 15-year-old (La Seca) pistachio plants from the Pistacia vera cv.Kerman variety.This variety, one of the prevalent female cultivars in Spain, is esteemed for its exceptional nut quality and adaptability to diverse environmental conditions (Mandalari et al., 2021).These plants were grafted onto the UCB rootstock, a P. atlantica × P. integerrima hybrid.The orchard design adopted a 7 × 6 m triangular planting pattern with a NE-SW orientation, maximising sunlight interception and ensuring efficient resource utilisation.The male cultivar used in these orchards was cv.Peter.Standard agricultural practices were followed, including applying specific agrochemicals to manage weeds, pests, and diseases in order to safeguard yield potential.

Irrigation treatments
The pistachio trees (Fig. 1) were exposed to two distinctive irrigation treatments during their growth phase.The high irrigation treatment (H) delivered 50% more water than the control treatment (C).In "La Seca", trees were irrigated from January to October 2022 using a computercontrolled drip irrigation system.The duration of each irrigation episode was used to vary the amount of water applied in each treatment, depending on the season and climatic conditions.In 2022, trees at "La Seca" received total irrigation volumes of 2750 m 3 ha − 1 for the control treatment (SC) and 4660 m 3 ha − 1 for the high irrigation treatment (SH).
Conversely, in "Moraleja", irrigation was provided from May to October.The control treatment (MC) received 844 m 3 ha − 1 , while the high treatment (MH) was allocated 1161 m 3 ha − 1 .Such systematic variation in water provisioning was executed to study the impacts of differential irrigation on pistachio tree growth and productivity.

Harvest assessment and commercial quality
Upon reaching maturity in October 2022, twenty trees, evenly split between the two irrigation treatments and locations (five replicates per condition), were harvested.Agronomic and commercial quality metrics were gauged after harvesting.The yield was estimated with post-harvest samples that underwent peeling, dried for 24 h at 60 • C to mitigate mycotoxin risks and weighed (Yield, kg tree − 1 ).The size was determined by the number of pistachios in one ounce (Commercial Caliber,28.35 g).
The percentage of open husk (Split), closed husk (Non-Split), and empty nuts (Blank) were calculated per tree at a representative subsample of twenty five nuts from the yield obtained, without taking into account the height and orientation of the fruit on the tree.

Image capturing
Hyperspectral imagery was captured using the SPECIM IQ camera (SPECIM IQ camera, Specim Spectral Imaging Oy Ltd, Oulu, Finland).It is equipped with a VNIR CMOS sensor, offering a 400-1000 nm spectral range.It features a viewfinder camera with a 5 Mpix resolution, rescaled to 1280*960 pixels for image previews.Specim's proprietary software interface facilitated control and operation.Data is stored on SD cards, with a maximum capacity of 32 GB, in a format that includes Specim Dataset files, compatible with ENVI software.The device operates on a 5200 mAh Li-Ion battery, allowing for approximately 100 measurements per charge and storage cycle.It has a 4.3-inch touchscreen and 13 physical buttons for user interaction.Connectivity options include USB Type-C and WiFi.The camera's dimensions are 207 x 91 × 74 mm, weighing 1.3 kg.It has an F/number of 1.7, spectral resolution FWHM of 7 nm, and captures data in 204 spectral bands.The peak signal-to-noise ratio exceeds 400:1.For this research, the camera was used within the optimal parameters at temperatures between +5 • C and +40 • C and up to 95% non-condensing humidity.

Image processing and data extraction
The pistachios in every image corresponded to a specific sample collected before harvest, with correspond to three bunches taken from the 20 trees selected in the different locations, according to orientation (North, N; South, S; East, E and West, W) and height (High, H and Low, L).However, due to the age of the trees at the Moraleja location, some samples could not be obtained in the upper or lower part of the trees.After processing, peeling, and drying the samples at 60 • C (24h), a total of 158 images, which corresponded with 2818 pistachios, were obtained.From La Seca, 80 photos were taken (40 of them from the high irrigation treatment and 40 of the control irrigation), and from Moraleja, 77 (37 of them from high irrigation treatment and 40 of the control irrigation).
Python 3.9, along with several external libraries, was used to process images and create models.Pandas and Numpy were used to work with the data.Scikit-learn was used to create the models, while Scikit-image and numpy were used for image processing.Finally, seaborn and matplotlib were used for plotting.The reflectance of the images was corrected using white (Spectralon, NH, USA) and dark references.The image correction was done according to the following formula (Sun, 2010), where Im Dark and Im White are the pixelwise average of the white and dark references, and Im RAW is the original image.
The number of pistachios in each image varied depending on the tree, treatment and location.The mean spectra of each pistachio were extracted to analyse the images, and Scikitimage's Otsu's binarisation (Otsu, 1979) was used to remove the background.The spectra were scatter-corrected using standard normal variate (SNV) before analysis.

Model generation
ML models were trained using the extracted mean spectra of each pistachio to predict the desired parameters: origin, irrigation treatment, a combination of origin and treatment, fruit orientation in the tree, fruit height in the tree, yield, split, non-split, blank and calibre.Only one predictive value was obtained per image, meaning all pistachios from the same image have the same value for yield, split, non-split, blank and calibre predictions.In addition, a Pearson correlation matrix between the different parameters was also carried out to observe the possible correlations between the data obtained.
Three different commonly used ML models were used: Partial Least Squares Discriminant Analysis (PLS-DA) or regression (PLS-R), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost).A 14core processor (Intel i9 12th gen; Intel Inc., Santa Clara, CA, USA), 16 GB of RAM DDR5, and an 8 GB GPU (NVIDIA GeForce RTX 3070Ti GPU; Nvidia Inc., Santa Clara, CA, USA) were used for training the models and doing all data analysis.
During the model evaluation, in classification models, accuracy, together with the confusion matrices, show the visualisation of the performance of an algorithm.Each row represents the actual classes, and each column represents the instances of the predicted classes.Accuracy is the ratio of correct predictions to the total number of predictions made by a model.It measures how often the model's predictions are correct overall.
R 2 , Mean Absolute Error (MAE) and Mean Squared Error (MSE) were used to evaluate performance in regression models.R 2 measures the proportion of the variance in the dependent variable explained by the independent variables in a regression model, MAE measures the average absolute difference between the predicted values and the actual values, whereas MSE measures the average of the squared differences between predicted and actual values.
Before training the models, the dataset, composed of 2819 spectra was randomly divided into a training set to train the models (70% of the spectra, 1973 spectra) and an independent set for validating them (30% of the spectra, 846 spectra) using Scikitlearn's train_test_split function.Then, Scikitlearn's RandomizedsearchCV was used to optimise each model's parameters to predict with the highest accuracy and lowest Root Mean Squared Error (RMSE) using 10-fold cross-validation (Tables 1 and  2).

Origin
Differences regarding location were found in the mean spectrum, as shown in Fig. 2. Differences appear in the NIR region (970 nm), corresponding with the water content (Büning-Pfaue, 2003).These  differences were more pronounced in wavelengths 675 nm (chlorophyll) and 450 nm (carotenoids) (Wellburn, 1994) and at the "La Seca" location.The variation in spectra is clear in the predictions, where all models accuracies above 94%.Both PLS and SVM models performed similarly regarding accuracies (0.99).However, SVM outperformed PLS in predicting La Seca pistachios, while PLS performed better in predicting Moraleja pistachios but worse in predicting La Seca pistachios (Table 3 and Table S1).

Irrigation treatment
Regarding irrigation treatment, some differences were found in the mean spectrum in this study, as shown in Fig. 3.Most differences appear in the peak associated with water (970 nm) in the NIR region (Büning-Pfaue, 2003), which was more pronounced in the La Seca location, and some in wavelengths related to colour pigments such as carotenoids (480 nm).The difference in spectra suggests that the results are correct.Predictions can be made with high accuracies, where XGBoost showed the worst score with a 0.72 value and PLS the best accuracyof 0.92 (Table 4 and Table S2).

Origin and irrigation treatment
Mixing all origins when predicting irrigation treatment can lead to poorer results since it is a more restrictive approach.Therefore, a prediction was made for each irrigation treatment of each origin.Fig. 4 displays significant differences between the four combinations of origin and location across the entire wavelength range.This separation improved classification, resulting in higher accuracies than mixing different location samples.In this study, SVM produced the highest accuracyscore of 0.97, while XGBoost had the lowest, of 0.87 (Table 5 and Table S3).

Fruit orientation in the tree
Climate conditions such as sun, wind, or humidity can vary between each part of the tree.Therefore, orientation was studied and predicted using the models.However, even if Fig. 5 shows some differences in spectra, the prediction of orientation using all data obtained poor accuracies (0.35 for SVM and 0.36 for PLS and XGBoost).Therefore, since predicting from different origins and irrigation treatments with different spectra between them proved to give poor results, the prediction was also made for each origin and each origin and water supply.However,  results only increased slightly.When predicting fruit orientation in the tree by origin, the highest accuracyof 0.37 was obtained using SVM for La Seca and with PLS and SVM for Moraleja.In the case of differentiating between origin and irrigation treatment, results increased slightly compared with the ones obtained by separating by origin.In this case, some differences were found regarding the scores: better results were obtained in pistachios with high irrigation treatment (accuracies between 0.36 and 0.5), with XGBoost providing the best results: 0.5 for Moraleja and 0.45 for La Seca.In pistachios with control treatment, results were lower and almost identical for La Seca and Moraleja pistachios: accuracies between 0.36 and 0.39, with PLS providing the best results: 0.38 for Moraleja and 0.39 for La Seca (Table 6 and Table S4).Thus, irrigation treatment seems to influence the differences between the pistachio location and the prediction.No studies were found that tried to predict fruit orientation or location on the tree.

Fruit height location in the tree
Similarly, as in orientation, there are sun, wind, and humidity differences between high and low pistachios in the tree.For that reason, height location was tried to be studied and predicted.Fig. 6 shows little differences in spectra.However, the prediction of pistachio height location in the tree obtained better results than predicting fruit orientation in the tree.Using all data, accuracies of 0.61 for PLS and 0.59 for XGBoost and SVM were obtained (Table 7 and Table S5).Each prediction was made for fruit orientation in the tree, origin, irrigation treatment, and combination of origin and water treatment.When predicting fruit orientation in the tree by origin, results increased slightly.The highest accuracies were obtained using PLS for La Seca and Moraleja, with accuracies of 0.69 and 0.62, respectively.In the case of differentiating between origin and irrigation treatment, results increased slightly compared with the ones obtained by separating by origin.The same differences as in fruit orientation in the tree prediction were found regarding the scores: better results were obtained in pistachios with high irrigation treatment (accuracies between 0.5 and 0.75), with PLS providing the best results: 0.71 for Moraleja and 0.75 for La Seca.In pistachios with control treatment, results were lower and almost identical for La Seca and Moraleja pistachios: accuracies between 0.54 and 0.64, with PLS providing the best results: 0.55 for Moraleja and 0.64 for La Seca (Table 7 and Table S5).Thus, it is possible to predict fruit height location on the tree at an origin and irrigation treatment level, but not so well at a higher level due to the difference in spectra between origin and treatment combinations.ACC, Accuracy.

Yield, split, non-split, blank and calibre predictions
Yield, split, non-split, blank and calibre are significant quality factors that play a crucial role in pistachio production.In order to determine whether it was possible to predict these parameters, PLS, XGBoost, and SVM regression models were developed.The results of these models and the test set prediction plots (seaborn regplot) are presented in Table 8 and Table S6.
Accurately predicting crop yield is crucial for growers as it determines the economic benefits they will receive in the future and the necessary crop inputs.Results of a study show that with the PLS method, a high R 2 score of 0.89 can be achieved, with SVM an R 2 value of 0.88, and with XGBoost an R 2 score of 0.38.Besides, the pistachio split is very important for the farmer.The best prediction result was obtained using SVM, with a maximum R 2 value of 0.58, R 2 of 0.56, and PLS and XGBoost produced a result of 0.37 R 2 .Regarding non-split predictions, prediction decreased compared with split prediction, and the highest R 2 value was obtained using PLS (0.37).However, the results for blank, which is significant for measuring the quality of sellers, were better.The best R 2 value of 0.71 was obtained using PLS, while the poorest result was obtained using XGBoost, with an R 2 of 0.48.Regarding calibre, key for the farmer's revenue, the best prediction was obtained using SVM with an R 2 of 0.57 (Table 8 and Table S6).

Correlation between parameters
Fig. 7 displays the correlations between the yield and commercial quality parameters with the data extracted from the hyperspectral image in a Pearson correlation matrix.The matrix reveals that calibre is related to location (0.59).Moraleja samples have an average bigger calibre ( 22) than La Seca samples (20), blank was inversely correlated with yield (− 0.79), and pistachio split (− 0.72).Similarly, the split was highly correlated with yield (0.7) and slightly correlated with treatment (0.52), where control-treated samples showed an average split of 59.2, whereas high-treatment samples showed an average split of 44.9.Additionally, the yield correlated with location (0.59) and irrigation treatment (0.65).The average yield of La Seca samples was 1.9 kg tree − 1 , whereas in Moraleja samples, it was 5.2 kg tree − 1 .Lastly, pistachio height and orientation in the tree did not indicate a correlation with any of the other factors.

Discussion
This study highlights the applications of non-invasive techniques in  pistachio nuts, involving the application of HSI 400-1000 nm to accurately predict the geographic origin of pistachios and evaluate the impact of irrigation practices on their spectral signatures, alongside predicting essential quality and yield metrics.Through rigorous analysis, models employing Partial Least Squares (PLS), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost) demonstrated exceptional capability in distinguishing between pistachios from La Seca and Moraleja, achieving accuracies above 94%.Notably, for water content and colour pigments, both PLS and SVM models reached an accuracy of 0.99, indicating a precise differentiation based on these spectral characteristics.In this sense, the results align with the techniques presented in (Aktas ¸et al., 2022), where AlexNet and Inception V3 deep learning structures were employed, achieving test accuracies of 96.13% and 96.54%, respectively.Similarly, the application of convolutional neural networks (CNNs) such as AlexNet, VGG16, and VGG19 models demonstrated high success rates in classifying Kirmizi and Siirt pistachio types based on image data, with the VGG16 model achieving the highest classification success of 98.84% (Singh et al., 2022).Furthermore, an improved k-NN classifier, combined with image processing techniques for feature extraction and dimension reduction, also showcased a notable classification success of 94.18% in distinguishing different pistachio species (Ozkan et al., 2021) further demonstrating the successfulness of the deep learning approaches highlighted in this study.
Similarly, (Karadag and Kılıç, 2023) applied deep learning for object detection, achieving detection accuracies of 98% for open-shelled and 85% for closed-shelled pistachios.Nevertheless, the novelty of the present research is the use of machine learning to identify the Irrigation Treatments and Geographical Origin.In exploring the prediction of pistachios' geographic origin, the results are inspired by those (Soleimanipour et al., 2022), which focus on the cultivar identification of Iranian pistachios using the EfficientNet-B3 model.The study achieved good results, with a hold-out test dataset accuracy of 98.00% and average precision, recall, and accuracy values of 96.73%, 96.70%, and 96.67%, respectively.These outcomes demonstrate the powerful capability of deep learning models in accurately identifying pistachio cultivars, underscoring the technology's importance in ensuring the traceability and authenticity of agricultural products, much like the manuscript's objectives of geographic origin prediction and evaluation of irrigation practices on spectral signatures.The study's investigation into the effects of irrigation treatment on spectral signatures unveiled significant differences, with high irrigation treatments revealing distinct spectral features, particularly at the 970 nm peak in the Near-Infrared (NIR) region, which is associated with water content (Büning-Pfaue, 2003).This finding is critical, showing spectral data's potential to differentiate irrigation practices.The best accuracy observed for this differentiation was 0.92 with the PLS model, emphasising its efficacy in distinguishing between varying irrigation treatments.Moreover, the differences in the 675 nm and 450 nm bands could be due to colour pigments such as chlorophylls and carotenoids, respectively (Walsh et al., 2020;Wellburn, 1994).The research enhanced classification performance by incorporating geographic origin and irrigation treatment variables.Specifically, combining these factors led to higher accuracies, reaching up to 0.97 with the SVM model.Similar results were found in predicting the origin of Zanthoxylum bungeanum Maxim with a 97% accuracy (Ke et al., 2020) or in Jatropha curcas L. seeds (Gao et al., 2013) with a 94% correct classification.These findings underscore the subtle impact of both geographical origin and irrigation practices on the spectral signatures of pistachios.The research also revealed that irrigation levels significantly influenced the accuracy of predictions for both fruit orientation and height, with higher irrigation generally improving classification outcomes.This insight could be related to the findings of other authors (Loggenberg et al., 2018;Polder et al., 2024), who found that water influences the hyperspectral response of the plants and emphasised the importance of considering irrigation in spectral analysis for agricultural management.Furthermore, in other crops, other studies did study irrigation treatment, as in waterlogging stress of oilseed rape leaves, with a 96% accuracy.In fruits, up to a 77% accuracy was obtained when discriminating tomato plants under two irrigation regimes (Rinaldi et al., 2015), and in tomato fruits, achieved 100% and 90% accuracy differentiating between water-stressed tomatoes and non-stressed together with abiotic and biotic stress in tomatoes respectively (Susič et al., 2018).
However, the study encountered challenges in predicting the fruit's orientation within the tree, indicated by lower accuracies of 0.35 for SVM and 0.36 for both PLS and XGBoost models.This result reflects the complexity of classifying spectral variances due to fruit orientation.In contrast, predicting the height location of pistachios within the tree was more successful, particularly with PLS models for La Seca and Moraleja, suggesting a clear spectral distinction for pistachios at different heights.These results are paralleled in (Aktas ¸et al., 2022), where the accuracy difference between industrial and desktop datasets was significant, with desktop dataset training achieving 100% accuracy but dropping to 61.75% when tested with industrial dataset images (Rahimzadeh and Attar, 2022).Also encountered challenges, notably in dealing with occlusions and deformations, yet achieved a counting accuracy of 94.75%.These studies, along with the manuscript's findings, highlight the challenges of ML applications in agriculture, stressing the need for tailored approaches to different problem contexts.
Further, the study developed regression models with high R 2 scores for predicting yield and commercial quality factors.Specifically, the yield prediction achieved an R 2 of 0.88 with the SVM model, and the prediction of blank quality reached an R 2 of 0.71 with the PLS-R model.As previous research (Caporaso et al., 2018;Elmasry et al., 2012;Torres-Rodríguez et al., 2022;B. Wang et al., 2023), these models effectively forecast essential agricultural metrics, offering valuable tools for optimising pistachio production.Therefore, the integration of spectral analysis with precision agriculture technologies could optimise pistachio production and management, a vision shared by (Aktas ¸et al., 2022), who demonstrated how accurately the industrial dataset performs in classification applications, achieving up to 99.84% accuracy, emphasising the practicality of ML in industrial settings.Similarly, (Karadag and Kılıç, 2023) introduced a robotic sorting system that significantly enhances the efficiency of sorting processes, offering a glimpse into the future of automated agricultural systems.Both articles, alongside the manuscript, underscore the transformative potential of ML and deep learning technologies in industrial applications, promising improvements in efficiency, productivity, and sustainability within the agricultural sector.
Finally, correlation analysis provided deeper insights, revealing an  inverse correlation between blank and yield and a positive correlation between split and yield.This analysis aids in understanding the complex factors influencing pistachio quality and production.Nonetheless, modelling the split and non-split conditions of pistachios proved challenging, with lower R 2 scores highlighting the difficulties in using spectral data for these specific commercial quality attributes.The relationship between pistachio calibre and geographic location, with a correlation coefficient of 0.59, highlighted the influence of origin on pistachio size, with Moraleja samples typically larger.This research highlights the significant potential of combining HSI with machine learning in agricultural science, particularly for identifying the unique characteristics of pistachios based on their geographic origin, irrigation methods, and quality markers.Despite challenges in classifying certain features accurately, the findings provide a foundation for future studies to delve into the spectral analysis of diverse pistachio varieties and leverage these insights alongside precision agriculture technologies.Such efforts could markedly improve agricultural productivity and management by examining a broader range of pistachio varieties for distinctive spectral signatures and integrating this knowledge with advanced agricultural practices.The potential application of HSI through drone technology further offers promising avenues for optimising water usage, enhancing crop quality, and promoting sustainable management practices in pistachio orchards.These advancements could ultimately lead to more efficient and sustainable pistachio production, with benefits spanning increased efficiency, enhanced productivity, and reduced environmental impact.

Conclusions
This study underscores the efficacy of Hyperspectral Imaging (HSI) and Machine Learning (ML) in precisely determining the geographic origin of pistachios, assessing the impact of irrigation practices on their spectral signatures, and forecasting key quality and yield parameters. Utilising advanced modelling techniques such as Partial Least Squares (PLS), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost), we attained high accuracies exceeding 94%, notably in distinguishing pistachios from La Seca and Moraleja based on spectral data.This high level of precision is exemplified by accuracies of 0.99 for both PLS-DA and SVM models in assessing water content and colour pigments, highlighting HSI's potential in precision agriculture.
The research further elucidates the distinct spectral signatures attributable to irrigation treatments, especially the discernible peak at 970 nm in the NIR region, where the PLS model exhibited an accuracy of 0.92.This differentiation capability underscores HSI's application in enhancing sustainable agricultural resource management.However, predicting the orientation of the fruit on the tree presented challenges, contrasting with the more successful prediction of fruit height on the tree, indicating more explicit spectral differences associated with the height of pistachios within the tree.
Additionally, our findings from the correlation matrix reveal that   blank quality is inversely proportional to yield and pistachio split, while yield shares a strong correlation with pistachio split and a slight correlation with irrigation treatment.Notably, the location and treatment displayed a low correlation with yield, enriching our understanding of the various factors influencing pistachio commercial quality and yield.Our regression models also showed promising results in predicting yield, with an R 2 score of 0.88 for yield predictions using the SVM model and 0.71 for blank predictions with the PLS-R model, alongside an R 2 score of 0.89 for dry matter.These models prove the effectiveness of spectral analysis in forecasting vital agricultural metrics, marking a significant advancement in optimising pistachio production.Challenges remained in predicting pistachio shell split and calibre, with R 2 scores of 0.58 and 0.57, respectively, suggesting areas for future improvement.Nevertheless, the study validates HSI as a precise and effective tool for identifying pistachio characteristics based on geographic origin and irrigation practices and predicting commercial quality and yield metrics.The overall findings spotlight the significant potential of spectroscopic analysis in enhancing agricultural practices.Future research should extend to different pistachio varieties and incorporate these insights with cutting-edge agricultural technologies, potentially revolutionising pistachio production management and sustainability.

Fig. 1 .
Fig. 1.Setup of the SPECIM IQ hyperspectral camera on a tripod to capture images during indoor measurements.

Fig. 3 .
Fig. 3. Mean spectra after SNV treatment of the two irrigation treatments (high and control).

Fig. 4 .
Fig. 4. Mean spectra after SNV treatment of each of the origin and irrigation treatment combinations.

Fig. 5 .
Fig. 5. Mean spectra after SNV treatment of each fruit orientation in the three: North (N), South (S), East (E) and West (W).

Fig. 6 .
Fig. 6.Mean spectra after SNV treatment of each fruit height location in the tree (high and low).

Fig. 7 .
Fig. 7. Pearson correlation matrix with all the studied parameters.

Table 1
Classification models.

Table 3
Test set prediction results of the models for pistachio origin classification.

Table 4
Test set prediction results of the models for pistachio irrigation treatment classification.

Table 5
Test set prediction results of the models for pistachio origin and irrigation treatment combination classification.

Table 6
Test set prediction results of the models for pistachio orientation in the tree classification.

Table 8
R.Martínez-Peña et al.Current Research in Food Science 9 (2024)100835 Test set prediction results of the regression models for pistachio yield, split, non-split, blank and calibre.
(continued on next page)